Skip to content

Conversation

@JacoCheung
Copy link
Collaborator

@JacoCheung JacoCheung commented Jan 7, 2026

Description

The related issue: #240
In this PR, a new model is added: semantic id based decoder.
Some highlights:

  1. We use mcore standard transformer block as the backbone of our decoder.However, our jagged input and beam search require special masks, as a result, we have to create padded dense attention mask and invoke the local impl of SPDA of mcore.
  2. We implemented a simple beam search module, which is called during evaluation.
  3. The model is not SOTA impl, and the convergence needs to be confirmed. Only amazon beauty dataset is tested.
  4. We now only support single GPU

Checklist

  • Code Cleaning
  • README
  • CI pipeline ( integration test)

CI:
CI

@JacoCheung JacoCheung requested review from shijieliu and z52527 January 7, 2026 08:46
@JacoCheung JacoCheung changed the title Add sid gr model with validation on amzn beauty dataset [Draft] Add sid gr model with validation on amzn beauty dataset Jan 7, 2026
@shijieliu
Copy link
Collaborator

@JacoCheung and how about we move all the ops and modules under commons?

@JacoCheung JacoCheung force-pushed the junzhang/fea_sid_gr_gpt branch from 0392315 to 01c39f7 Compare January 9, 2026 03:39
@JacoCheung JacoCheung requested a review from geoffreyQiu January 9, 2026 03:40
@JacoCheung JacoCheung changed the title [Draft] Add sid gr model with validation on amzn beauty dataset Add sid gr model with validation on amzn beauty dataset Jan 9, 2026
@shijieliu shijieliu merged commit 0ff7357 into NVIDIA:main Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants